A Practical Part-of-Speech Tagger

نویسندگان

  • Douglas R. Cutting
  • Julian Kupiec
  • Jan O. Pedersen
  • Penelope Sibun
چکیده

We present an implementation of a part-of-speech tagger based on a hidden Markov model. The methodology enables robust and accurate tagging with few resource requirements. Only a lexicon and some unlabeled training text are required. Accuracy exceeds 96%. We describe implementation strategies and optimizations which result in high-speed operation. Three applications for tagging are described: phrase recognition; word sense disambiguation; and grammatical function assignment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Porting a Stochastic Part-of-Speech Tagger to Swedish

A b stract The Xerox Part-of-Speech Tagger (XPOST) claims to be practical. One aspect of practicality as defined here is reusability. Thus it is meant to be easy to port XPOST to a new language. To test this, XPOST was ported to Swedish. This port is described and evaluated. In previous work on part-of-speech tagging, a practical part-of-speech tagger was defined as one with the following set o...

متن کامل

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

The Grammar of Sense : Using part - of - speech tags as a rst step

This paper describes two experiments: one exploring the amount of information relevant to sense disambiguation contained in the part-of-speech eld of entries in a Machine Readable Dictionary (MRD); the other, more practical, experiment attempts sense disambiguation of all content words in a text assigning MRD homographs as sense tags using only part-of-speech information. We have implemented a ...

متن کامل

Part of Speech Tagging with Mixed Approaches of Neural Networks and Transformation Rules

For the purpose of constructing a practical part of speech tagger that uses as few training data as possible, an approach using neural networks, which uses di erent lengths of contexts based on longest context priority and takes into account the maximization of information amount, have been proposed so far. To further improve the tagging performance, this paper proposes an integrated approach o...

متن کامل

The Grammar of Sense : Using Part - of - Speech Tags as a Firststep

This paper describes two experiments: one exploring the amount of information relevant to sense disambiguation contained in the part-of-speech eld of entries in a Machine Readable Dictionary (MRD). Another, more practical, experiment attempts sense dis-ambiguation of all open class words in a text assigning MRD homographs as sense tags using only part-of-speech information. We have implemented ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992